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Abstract. We consider the problem of optimizing a real-valued contin- 
uous function / using a Bayesian approach, where the evaluations of / 
are chosen sequentially by combining prior information about /, which is 
described by a random process model, and past evaluation results. The 
main difficulty with this approach is to be able to compute the posterior 
distributions of quantities of interest which are used to choose evaluation 
points. In this article, we decide to use a Sequential Monte Carlo (SMC) 
approach. 

1 Overview of the contribution proposed 

We consider the problem of finding the global maxima of a function / : X — > 
R, where X C R'' is assumed bounded, using the expected improvement (EI) 
criterion [1,3]. Many examples in the literature show that the EI algorithm is 
particularly interesting for dealing with the optimization of functions which are 
expensive to evaluate, as is often the case in design and analysis of computer 
experiments [2]. However, going from the general framework expressed in [1] to 
an actual computer implementation is a difficult issue. 

The main idea of an El-based algorithm is a Bayesian one: / is viewed as a 
sample path of a random process defined on K^. For the sake of tractability, 
it is generally assumed that ^ has a Gaussian process distribution conditionally 
to a parameter 6* G C R^, which tunes the mean and covariance functions of 
the process. Then, given a prior distribution ttq on 6 and some initial evaluation 
results ^ {Xi ),..., ^ [Xno ) at ^'Ci , . . . , X^a , an (idealized) EI algorithm constructs 
a sequence of evaluations points X„j,+i, X„j,+2, ■ ■ ■ such that, for each n > uq, 



where 7r„ stands for the posterior distribution of 6, conditional on the cr-algebra 
J"„ generated by Xi, $(Xi ),..., X„, ^(X„), and 



is the EI at x given 9, with M„ = ^(-''^o) V ■ • • V £,{Xn) and E„.e the conditional 
expectation given J-n and 9. In practice, the computation of p„ is easily car- 
ried out (see [3]) but the answers to the following two questions will probably 




(1) 



p^{x-9) := E„,e((e(^„+i) -M„)+ | X, 



n+l 



have a direct impact on the performance and apphcabihty of a particular im- 
plementation: a) How to deal with the integral in p„? b) How to deal with the 
maximization of Pn at each step? 

We can safely say that most implementations — including the popular EGO al- 
gorithm [3] — deal with the first issue by using an empirical Bayes (or plug-in) 
approach, which consists in approximating 7r„ by a Dirac mass at the maximum 
likelihood estimate of 9. A plug-in approach using maximum a posteriori estima- 
tion has been used in [6] ; fully Bayesian methods are more difficult to implement 
(see [4] and references therein). Regarding the optimization of p„ at each step, 
several strategies have been proposed (see, e.g., [3,5,7,10]). 

This article addresses both questions simultaneously, using a sequential Monte 
Carlo (SMC) approach [8, 9] and taking particular care to control the numer- 
ical complexity of the algorithm. The main ideas are the following. First, as 
in [5], a weighted sample In = {{On,i,i'Jn,i) £ O x R,l < i < 1} from 7r„ is 
used to approximate that is, X]i=i Pnix;0n,i) -^i Pn{x)- Besides, at 
each step n, we attach to each 9n,i a (small) population of candidate evaluation 
points {Xn,i,j-, 1 < J < J} which is expected to cover promising regions for that 
particular value of 9 and such that max,j p„ {xn,i,j) ~ max^, Pn{x). 

2 Algorithm and results 

At each step n > uq of the algorithm, our objective is to construct a set of 

weighted particles 

ln,i,j = {9n,i,Xn,i,j) G © X X, € R , 1 < i < J, 1 < j < J } (2) 

SO that Wn,ijS^n,i,j -^i,J with 

d<(7) = gn{x\9)dX{x)dTrn{9), xGlL, 0gO, ^={9,x), 

where A denotes the Lebesgue measure, gn{x \ 6) = gn{x \ 6)/cn{9), gn{x | ^) is a 
criterion that reflects the interest of evaluating at x (given 9 and past evaluation 
results), and c„(0) = J^gn{x \ 0)dx is a normalizing term. For instance, a 
relevant choice for gf„ is to consider the probability that ^ exceeds M„ at x, at 
step n. (Note that we consider less ^s than xs in 25„ to keep the numerical 
complexity of the algorithm low.) 

To initialize the algorithm, generate a weighted sample T„o = {{9no,iT''^no,i)j 
!<«</} from the distribution 7r„(,, using for instance importance sampling 

with ttq as the instrumental distribution, and pick a density over X (the 
uniform density, for example). Then, for each n > uq: 

Step 1: demarginalize — Using T„ and qn, construct a weighted sample 
of the form (2), with Xn,i,j ~ Qn, w'n,i,j = '^n<i^i}§^~~ji~' ^^'^ ^^'i = 



Step 2: evaluate - Evaluate ^ at = argmaXj ^j,^-^ u>„.i' Pn{xn.i,j\Sn,i')- 

Step 3: reweight/resample/move — Construct T„+i from T„ as in [8]: reweight 
the 6n,iS using Wn+i,i oc '^^'^(g^"')^ Wn,i, resample (e.g., by multinomial resam- 
pling), and move the On^iS to get ^„+i,iS using an independant Metropolis- 
Hastings kernel. 

Step 4- forge Qn+i — Form an estimate Qn+i of the second marginal of tt^ from 
the weighted sample X„ = ^ j), 1 < i < 1,1 < j < J}- Hopefully, 

such a choice of Qn+i will provide a good instrumental density for the next 
demarginalization step. Any (parametric or non-parametric) density estimator 
can be used, as long as it is easy to sample from; in this paper, a tree-based 
histogram estimator is used. 

Nota bene: when possible, some components of 6 are integrated out analytically 
in (1) instead of being sampled from; see [4]. 

Experiments. Preliminary numerical results, showing the relevance of a fully 

Bayesian approach with respect to empirical Bayes approach, have been provided 
in [4]. The scope of these results, however, was limited by a rather simplistic im- 
plementation (involving a quadrature approximation for p„ and a non-adaptive 
grid-based optimization for the choice of Xn+i)- We present here some results 
that demonstrate the capability of our new SMC-based algorithm to overcome 
these limitations. 

The experimental setup is as follows. We compare our SMC-based algorithm, 
with I = J = 100, to an EI algorithm in which: 1) we fix 9 (at a "good" value 
obtained using maximum likelihood estimation on a large dataset); 2) Xn+i is 
obtained by exhaustive search on a fixed LHS of size I x J. In both cases, we 
consider a Gaussian process with a constant but unknown mean function (with 
a uniform distribution on R) and an anisotropic Matern covariance function with 
regularity parameter ly = 5/2. Moreover, for the SMC approach, the variance 
parameter of the Matern covariance fimction is integrated out using a Jeffreys 
prior and the range parameters are endowed with independent lognormal priors. 

Results. Figures 1(a) and 1(b) show the average error over 100 runs of both 
algorithms, for the Branin function {d = 2) and the log-transformed Hartmann 6 
function {d — Q). For the Branin function, the reference algorithm performs bet- 
ter on the first iterations, probably thanks to the "hand-tuned" parameters, but 
soon stalls due to its non-adaptive search strategy. Our SMC-based algorithm, 
however, quickly catches up and eventually overtakes the reference algorithm. 
On the Hartmann 6 function, we observe that the reference algorithm always 
lags behind our new algorithm. 

We have been able to find results of this kind for other test functions. These 
findings are promising and need to be further investigated in a more systematic 
large-scale benchmark study. 
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(a) Branin function (dimension 2) 
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(b) Hartmann 6 function (dimension 6) 



Fig. 1. A comparison of the average error to the maximum (100 runs) 
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